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ABSTRACT 


Using the convenient second-order interval properties of 
a two-state semi-Markov model for a univariate point proc- 
ess, an automated technique for the estimation of the param- 
eters in the model was researched and discussed. The power 
spectral density of intervals was estimated by the period- 
ogram and a Kolmogorov-Smirnov test of fit was conducted. 
The asymtotic exponential distribution and independence of 
the periodogram points were used to calculate an approximate 
likelihood function. A system of equations was then formed 
to find the maximum likelihood estimates of the parameters. 
Since closed-form solutions for the estimates could not be 
found, an iterative method to stabilize initial guesses of 
the parameter values was attempted with only limited success. 
Results on uSing Kolmogorov-Smirnov type statistics and the 
spectrum of intervals to test the fit of stochastic process 


models to data have also been obtained. 
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I. INTRODUCTION 


It is in the nature of the Operations Research approach 
to the study of problems to attempt the construction of a 
mathematical model for the problem. Subclasses of mathemat- 
ical models include stochastic, i.e. utilizing random vari- 
ables, and en te models. If a stochastic model 
seems appropriate and a general model is proposed, it re- 
Mains necessary to estimate parameters of the model from 
data. Parameter estimates, as well as the general form of 
the model, usually come from detailed analysis of data ob- 
served from the problem or process under investigation. 

| Several techniques utilizing observed data exist for the 
estimation of parameters for stochastic models. Typically 
the methods of moments or maximum likelihood are used and 
usually yield estimates with some desirable properties. 
Methods such as these frequently require the simultaneous 
solutions to a system of equations in order to find esti- 
mates. A number of computer approximation routines have 
been developed for the solution of such systems, but their 
usefulness seems limited. 

One proposed stochastic model provided the impetus for 
this research. Lewis and Shedler [1973], while studying 
page reference patterns in a demand paged computer system, 
formulated a univariate two-state semi-Markov model for the 


process of page exceptions. Page exceptions occur because a 





computer program which is in execution has been stored in 
blocks of storage called pages. Some of these pages must be 
in core storage for the program to be executing, while the 
remaining pages may be located on peripheral storage de- 
vices. Following the execution of each instruction a page 
is referenced which contains the next instruction. If this 
referenced page iS in core storage execution continues; how- 
ever, if the referenced page is not in core storage then 
execution is interrupted and the referenced page must be 
read into core storage. This type of interruption is refer- 
red to aS a page exception. Data for this procesSs was gen- 
erated by counting the number of page references occurring 
DeMReGn ESE exceptions. Lewis and Shedler [1973] discussed 
their procedure for estimating parameters which they de- 
scribed as an ad hoc method, and concluded that there was a 
need to formalize the parameter estimation procedure. 

The purpose of the research in this thesis was to uti- 
lize the convenient second-order interval properties of a 
univariate two-state semi-Markov process to produce an auto- 
mated, computer programed, technique for the estimation of 
parameters for the model. This was desirable because the ad 
hoc method used by Lewis and Shedler [1973] was very time- 
consuming and there exists a considerable body of page ex- 
ception data which it is desired to analyze. The basic 
procedure was to calculate an estimate for the power spec- 


tral density of the process, namely the periodogram, and 





utilize an approximate method of maximum likelihood to esti- 
mate the parameters. 

It will be seen that the proposed procedure dia not work 
as well as hoped, but the problems which arose pointed up 
other possible attacks on the problem. It should also be 
noted that model fitting and parameter estimation for these 


point processes is almost a completely open field. 





II. BACKGROUND ANALYSIS 


A. TWO-STATE SEMI-MARKOV MODEL FOR UNIVARIATE POINT PROCESS 

Excellent discussions of this model can be found in Cox 
and Lewis [1966,Ch.7] and Lewis and Shedler [1973]. Those 
discussions are summarized here for continuity of exposition. 

Let the PaeeRee of random variables {X,,1=1,...,N} be 
interevent times, i.e. Xs is the interevent time between 
event (i-1) and event (i). In order that a discussion of 
equilibrium distributions may be avoided it was assumed that 
a hypothetical event has occurred at time zero, so that Xi 
the interval between time zero and the first event, is an 
observation from the same process as the remainder of the 
sequence, i.e. there is no Lene ea Totes sampling [Cox and 
Lewis 1966,Ch.4] included. 

Now suppose there are two types of intervals but that 
the interval type is not observable, i.e. a univariate point 
process. The two interval types have probability mass func- 
Etonse(p.M-n.) py,(x) and p, (x), respectively, with transi- 


tions between types described by a two-state Markov chain 


A = 0. 1-a, 
iene 0, 


That is, given that X. has p.m.f£. p,{x) then See Has pei. £ < 


with matrix 


Dewitt eprobdbility 0, dnd p-m-£. p, (x) with probability 


io imcepenaent Of the history Of previous intervals, etc. 





The vector of steady-state probabilities 7 = (1, T,) 
associated with the transition matrix A results from the 
solution of the matrix equation T = TA and it follows that 


1-a, 1-a, 


2-Q17Q2 2-417Q> : 

Tt Uo,” and ene oft are the mean and variance for intervals 
with p.m.f. p,(x) ‘and p,(x), respectively, the steady-state 
marginal results for intervals penreen events in the univar- 
late process, i.e. interval type not known, are as follows: 

P(x) = 7,)p, (x) + T2p2(x) , 

w= E(X) = Tipi + Tote , 

Comey (x) Maeno sete 5G 554 1] Ms (iin =o): 


The serial correlation coefficients of lag k, for the 


PEs 
intervals are of the form mp*/o2 where, for kK=1,2,..., 

M = 7172 (¥y~-U2) 7 B=a, ta, -l1. 
From these coefficients the positive portion of the power 


Spectral density may be computed, 


Cc 


Cle 2.8 
k=1 


oom 
TT 


P fw) = pcos kw) - 


The closed-form solution to the infinite series is given by 
Jolley [1961, series #545] yielding 
o? mg (cos WW) - 8 


Pte tee ie | (1) 
T o? 14+87-28cos w 


n 
The beneficial feature of the power spectral density for 
this model is that it only depends upon the mean and vari- 


ance of each of the two probability distributions and not 


on the complete distributions, and is thus fairly robust. 





The count spectrum [Cox and Lewis 1966, Ch.4] on the other 


hand, depends on the complete distributions. 


B. PARAMETER ESTIMATION TECHNIQUE 

Lewis and Shedler [1973] used a modified method of mo- 
ments approach in order to estimate the parameters in their 
model. The standard method of moments procedure for param- 
eter estimation is to calculate theoretical moments in terms 
of the unknown parameters and equate them to empirical ob- 
servations of these moments. An alternative to this method 
is the method of maximum likelihood. In this method param- 
eter values are selected which maximize the joint probabil- 
ity density of the observed data. To accomplish this there 
is a need for some distributional assumptions. However, 
even for a simple model such as the univariate two-state 
semi-~Markov model discussed here it is not possible to write 
down the joint density of the intervals. Thus the following 
approximate technique was proposed and tried. 

It is known, Cox and Lewis [1966,Ch.5], that an estimate 
of the power spectral density P(w) at wos the periodogram 
I(n), is in general asymtotically exponentially distributed 
[Olshen 1967]. The periodogram is an unbiased estimate, 
We. E[I(n)]=P(w); however, it is not a consistent estimate 
Since the variance of the exponential distribution is equal 
to the square of the mean, i.e. the variance does not de- 
crease with increased sample sizes. Moreover, for n, not 


equal to n, the periodogram points I(n,) and I(n,) are 
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asymtotically independent. Thus for finite sample size N an 
approximate likelihood function may be written by assuming 
the periodogram points are independent with exponential dis- 
tributions having mean value P(w,)- This is the technique 
explored in this thesis. 

The definition and development of the periodogram re- 
quires the finite’ Fourier transform. 

The finite Fourier transform was discussed by Cooley, 
Lewis and Welch [to be published in 1974]. Let {Y(}j), 
j=0,..-,N-1} be a sequence of N real numbers. The finite 


Fourier transform of Y(j) is then 


Nea _2Tinj 
a(n) =1 = y(j)e N , n=0,...,N-1. 
N 3=0 


This sequence of complex numbers may also be written in the 
form 
1 N-1 N-1 


= Ie Veos (que jet yo Yt) sini(io.) | 
N  j=0 "950 n 


a(n) 


where w = 2t1n, n=0,1,...,N-l. The periodogram I(n) is then 


n 
Nlia(n) |? 


I(n) = 27 PONCE wa) 

or 

N-1 N-1 

i) ae eG ycos(sa te ¥(q)sini(go. )y 
j=0 j=0 - 
27N 

The periodogram is an even, periodic, function and hence has 
only [N/2]+l distinct values, where [{N/2] is the integer 


part of N/2. Hereafter in the discussion N will refer to an 


even integer. Let I, (n)=2I (n) be the estimate for P (wi): 
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It is a eee that T, (0) is proportional to the 
Square of the arithmetic average of the observed data; thus 
no new information is obtained from IT, (0). Since N/2 is an 
integer Wy" and I, (N/2) 1s proportional to the square of 
an alternating summation of the data. Both I, (0) and 
I, (N/2) were ignored in what follows, thus leaving (N/2)-1 
periodogram points. It should be added that these two peri- 
odogram points, suitably avorantall tarsal have asymtotic yx* dis- 
tributions with one degree of freedom and not an exponential 
distribution. 

Now there is sufficient information to begin the approx- 
imate maximum likelihood search for parameter estimates. 

The parameters of this model that need estimation are the 
mean and variance of each marginal distribution and the two 
transition probabilities a, and a,. As a vector these pa- 
rameters will be labeled 6=(11,0,°*,U.,0,*%,G,,0,) and indi- 
vidually, to simplify notation, as 65,3=1,2,3,4,5,6, to 
stand for the parameter as an element of the vector 6. 


The approximate likelihood function can be written as 


-1, ay) 2 (wi: 8) 


(N/2)-1 Ih 
II Py (w_ 78) e ’ 


L(8) = 
n=l 


which is equivalent to 


S72) Se eee Sree gets 
L(6) = ( I Fon 78) e Py (wi;9) . 
n=l 


A more simple function to work with, which has the same max- 
amum as E(0), 1s the log likelihood function LL(68)=ln L(9) , 


where ln symbolizes the natural logarithm. 


We 





Then 
(N/2)-1 (N/2)-1 
LL(@) = - = In{P,(w ;6)} - = T,(n) 
n=1 ron n=l P,(w_;8) . 


+n’ 

In the typical mathematical approach to finding an un- 
constrained maximum of a function, it is a necessary condi- 
tion that all of the first-order partial derivatives of the 


function, with respect to the unknown parameters, be equal 


to zero, i.e. that 


(N/2)-1 
0 = aLL(8) = LL, = = Ty (n)-P, (Wy; 9) Parjelre+- 16 (2) 
06. n=1 P, (w,7 8) 
where P. and LL. are the first-order partial derivatives of 


P, (wy; 8) and LL(8), respectively, with respect to parameter 
O.. This process results in Six equations and six unknown 
parameters. Parameter estimates are found by simultaneously 
solving the system of equations for each of the parameters, 
although this may not yield a unique maximum. If the system 
is of a simple form it may be possible to get at least a few 
closed-form solutions which will reduce the size of the sys- 
Sein. 

Once the parameter estimates have been found it is nec- 
essary to show that a maximum has been achieved. A suffi- 
client condition for a maximum is thatthe matrix of second- 
order partial derivatives be negative definite. The final 
phase in this approximate likelihood estimation process is 
to verify the predictability of the model. The verification 
may be done, using the estimated parameters, by calculating 
other theoretical properties of the model, such as the spec- 


trum of counts discussed by Cox and Lewis [1966, Ch.4]), 
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which may then be compared with the corresponding empirical 
properties of the data. Note that the utility of the spec- 
trum of intervals in the approximate likelihood analysis is 
that it does not depend on the complete distributional form 
for p;(x) and p(x) while the spectrum of counts does. It 
will be seen later, however, that this independence leads to 
ill-conditioning in the solution of the maximum likelihood 


equations. 
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III. EXPERIMENTAL APPROACH 


The original data, analyzed by Lewis and Shedler [1973], 
was not available for this research. In view of this fact 
and since the purpose of the research was to evaluate the 
effectiveness of the previously described technique for pa- 
rameter meine cden it was felt that data observed froma 
model with known parameters would better aid the evaluation 
process. With this in mind a simulation of the model de- 
scribed by Lewis and Shedler [1973] was constructed for the 


purpose of generating such data. 


A. UNIVARIATE TWO-STATE SEMI-MARKOV SIMULATION 

The simulation, aS well as the model, was subdivided 
into three major subsections. The state transition matrix A 
was one subsection and the two distributions for intervals 
were the remaining two subsections. Lewis and Shedler [1973] 
postulated a geometric distribution for the long intervals 
and a negative binomial distribution for the shorter inter- 
vals. The parameters used for the simulation were those 
calculated by Lewis and Shedler [1973]. 

A Monte Carlo simulation, such as this, required a 
pseudo-random number generator with favorable serial corre- 
lation properties. Learmonth and Lewis [1973] discussed 
such a generator called SRAND. SRAND returns an observation 


from a standardized uniform distribution on the interval 


is) 





(0,1). SRAND is a multiplicative generator with a multi- 
plier of (7°) and a modulus of (2??!-1). 
The geometric distribution is of the form 


CE=D4) « O<p,<l; Sail 2a eer, 
with a mean p,=l/(1-p,) and variance o,*=p,/(1-p,)*. Uti- 
lizing the survivor function of the geometric distribution, 
l.e. prob{xX>x}=p,%*, x=1,2,... , a generator of geometric 
variates was obtained. It was of the form 

Semen) 7 lap.) bo; 
where R was an observation from SRAND and the symbol f[ {b} 
signified the smallest integer greater than or equal to b. 

The negative binomial distribution is of the form 
p, (x) ae ae bios 
>i 
O<p,<l, k>0, x=1,2,... , with mean y,=1+{kp,/(1-p,)} and 
variance o,°*=kp,/(l-p,)?*. Let X]A, denoting X given a fixed 
value of i, be distributed as a Poisson random variable with 
parameter X. Now let X have a gamma distribution with pa- 
rameters k and n, 
k k=1 a NA 


£0) = 0 0, oO. 
T (k) 


It can be shown uSing generating functions that the uncondi- 
tioned X has a negative binomial distribution with parame- 
ters k and p2=1/(l1l+n). 

To calculate a gamma variate with a parameter k, a posi- 
tive real number, it was necessary to employ Johnk's tech- 


nique [1964] for generating variates with the fractional 
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Part of k. let k be the integer part of k, if k>l, or zero 
if k<l, and let K be the fractional part of k. The sum, },, 
of k exponentially distributed random variables with param- 
eter n has a gamma distribution with parameters k and n. In 
Johnk's technique let U,, U, and U; be independent and iden- 
tically distributed observations from a uniform distribution 


on interval (0,1).such that 


Y = U,/* + y, 7 -k) <1 
10Gb deal ney, observations for U, and U, should be obtained. 
Then for gu, )/K yy and E=-ln U;, )\,=(ZxE)/n has Bessa dis- 
tribution with parameters k and n. Finally A=A,+\, has the 
required gamma distribution with parameters k and n. 

The generation of Poisson random variates with parameter 

XA was accomplished by letting X be equal to the largest in- 
teger n such that, for a sequence of independent identically 
distributed uniform random variates (U; ) from the interval 
Hone) 

U,xU,%...xU >e" 
If Gren then X=0. xX is then distributed as a Poisson var- 


late with parameter \. 


B. CALCULATION OF THE PERIODOGRAM 

The finite Fourier transform discussed in section II.B 
above requires on the order of N* complex operation pairs, 
1.e€. a Raareeeaeion and an addition. For large N this 
can be very costly in terms of calculation time. Cooley, 


Lewis and Welch [1970] discussed the use of a fast Fourier 


iy, 





transform algorithm which only requires on the order of 
N(xr,+...4r_) complex operation pairs where N=(r,%...%r), 
ace. r is a factor of N. The International Mathematical 
and Statistical Library [1973 revision] contains a computer 
subroutine, FFTR, which computes the fast Fourier transform 
of a real data sequence. For N=820, as in this research, 
the fast Fourier transform algorithm used only six percent 
of the number of complex operation pairs required by the 
straight-forward calculation method. Thus a significant 
Savings in computer operating time was realized. 

Utilizing previously described equations the periodogram 
I,(n) was computed and then used in a test of fit to the 
power Sagal density P, (w,). Cox and Lewis [1966, Ch.6] 
described a test based on the uniform distribution. While 
the periodogram has, asymtotically, an exponential distri- 
bution with mean Pi (wi), the quantity I, (n)/P, (w,) has an 
exponential distribution with mean one. This is true for 
each of the (N/2)-1 periodogram points. If all (N/2)-1 of 
these quantities are summed the total gives an interval of 
length over which there are (N/2)-2 points dispersed. The 
intervals between these points are each, hypothetically, an 
observation from a unit exponential distribution, i.e. the 
points form a Poisson process. It is a well known fact of 
the Poisson process that given M points are in an interval 


the M points are dispersed uniformly over the interval. 
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Thus, the sequence {Uiiy, i=l1,...,(N/2)-2}, where 
i 
nl el 7 Cael) 


JG) “a7 Sa 


a I (n)/P, (uw) 

are uniform order statistics. The empirical cumulative dis- 
tribution function for these quantities was then compared 
with the uniform cumulative distribution function uSing the 
Kolmogorov-Smirnov test. The null hypothesis is that the 
sequence {U,;)} is formed of uniform order statistics, while 
the alternative hypothesis remains unspecified. lLilliefors 
[1969] found that the critical values of the K-S test are 
too conservative when testing using exponential distribu- 
tions where the mean has been estimated, as in this case. 
Too conservative means that the listed critical value for a 
level of significance a has actually a level of significance 
less than a. If the above test, with modified percentage 
points, accepts the null hypothesis then the assumption of a 
semi-Markov model for the data has been justified. 

In order to test the periodogram it was necesSary to 


know P, (w As discussed earlier the correlation coeffi- 


a 
cient of lag Ki Ppe is p,=mB*/o? for this model. Let y(0), 
Y¥(1) and (2) be estimates of the variance and covariances 


of lags one and two, respectively, for the intervals. Then 


yu) o*p, = mB 


and 


¥(2) = o*p, = mB*.. 
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Solving simultaneously for m and 8, the estimates of m and 8 


are 
B = ¥(2) and m= 93) 
y (1) V2) : 
From (1) an estimate for P, (w.) was 
sig ~ (cos w,) -g 
Pi(w,) = 7 {¥(0) + 2mB } 


om’ 4 ~e 
These estimates were then used in the computations for the 


sequence {U(;)}-. 


C. SOLVING SIMULTANEOUS EQUATIONS FOR THE PARAMETERS 

The system of equations defined by (2) and (1) was ex- 
tremely complex, with no hope of finding a closed-form solu- 
tion for any of the parameters. The system was reduced, 
however, by noting from the geometric distribution assump- 
tion that the variance for the long intervals was a function 
only of the parameter p;, which also was the only parameter 
in the mean. It waS a Simple matter then to find the vari- 
ance aS a function of the mean which then reduced the system 
to only five unknown parameters. The system was still com- 
plex and required some iterative method for solution. 

Rao [1965] suggested an iterative method which he called 
the Method of Scoring. He called LL. the jth efficient 
score. The approach for this method was to assume some ini- 
tial trial solution. Using a first-degree Taylor's expan- 
Sion of the efficient scores about the trial solution, a 
system of linear equations was derived from which an addi- 


tive correction to the trial solution was found. The 
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iterations were repeated until the additive corrections be- 
came negligible. 
Specifically, let GP aarti 7 Ole. be the trial values for 


the unknown parameters. From (2) 


dLL (9) 5 : 97LL (A) 
. 2 A = = » . 0 Q ’ j= oeey ’ 
LL. ) 5" ened a. 1305 0; j=l, 5 


where aLL (8) /38,°=S,° the first-order partial derivative of 


and also let 3*LL(8)/(00.°90;°)=T5;4°. Then 


iC with respect tO 6 


was a system of linear equations with five unknowns. In 


Matrix notation the system had the form -Td6=S. Finally, 





the additive corrections were obtained from the equation 
60=-T *S where T”* was the inverse of the matrix T, assuming 
T was nonsingular. The new trial solution then became 

gh = ao+s6. 

Rao [1965] explained that the variance of the final es- 
timate ja of 8. waS approximated by the jth diagonal ele- 
ment of the matrix (-T~’). Recalling that the matrix of 
second-order partial derivatives of LL(8) should be negative 
definite, then -T and (-T~') were both positive definite. 

In order to apply the method of scoring it was necessary 
to determine initial estimates of the parameters. The mean 
and variance of the intervals and the parameters m and 8 all 


have been estimated. Utilizing the marginal properties of 


the model and the method of moments a system of four 
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equations was developed which was of the form 


X = (1-G2)Hy + (1-%)) Hy 
Oe 8) ; 
¥(0) = (1=0.5) (Uy -=1,) + (1-4,) 5,7 a) ih 
ae (1-8) au 
m= (=O) (1-05) (Us) 
(1-8) 2 


where X was the Seti for the mean of the intervals and 
the quantity ({i,*-ij,) was the estimate for o,* after making 
the geometric distribution assumption. From this system 
initial estimates for four parameters, as functions of the 


fifth parameter, were found and had the form 


22 a ’ 
ines) 
: CLR ME SIIT) 
a, = = ; 
(Wi7H2) 
Oo — Clean ; 


: (8) (Olam), = Cleat) Oh Sah) 
Oh gO SS Se 
CrCl) 
It only remained then to estimate parameter nu). 

Lewis and Shedler [1973] explained that their estimate 
of uw, involved an eyeball judgement of where linearity began 
in the tail of the log survivor function of the data. This 
linearity in the tail led to the postulate of the geometric 
distribution for the long intervals. Since only an initial 


estimate was needed, their method of estimating uw, was uti- 


lized again. The value of the interval where linearity 
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began was subtracted from each greater interval and the 
arithmetic average of these intervals was then taken as the 
estimate of 1. Now all parameters had been initially esti- 
mated and the iterative method of scoring was applied to 


stabilize these estimates. 
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IV. RESULTS AND CONCLUSIONS 


As the research for this thesis progressed two areas 
developed results which need to be discussed. The first of 
these was the test for justification of the exponential dis- 
tribution assumption for the periodogram points. A subrou- 
tine called KSTEST was written to conduct this test. Part 
of the output of this subroutine was the Kolmogorov-Smirnov 
statistic which then was compared to a critical value from 
the distribution proposed by Lilliefors [1969] for the case 
where the mean of the exponential distribution had to be es- 
timated. The 0.99 quantile of that distribution, i.e. a one 
percent level of significance, was 1.25. It was noted that, 
at this level, of four thousand trials made approximately 
Six percent were rejected as not having produced periodo- 
grams from a semi-Markov model. 

In addition to testing the hypothesis for each simula- 
tion another benefit was received. Since the testing did 
not strictly conform to that discussed by Lilliefors [1969], 
because each periodogram point had a different mean, it was 
felt that, for this case, quantiles of the distribution 
should be estimated. The four thousand data points of the 
statistic were obtained from four computer runs each contain- 
ing one thousand simulations. For each run, the data was 
divided into ten sections in serial order, i.e. the first 
one hundred points were the first section, etc. The ele- 


ments of each section were ordered and the 0.80, 0.85, 0.90, 
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0.95 and 0.99 empirical quantiles were observed. This re- 
sulted in ten observations for each quantile from which a 
mean and variance were estimated. Lastly, the entire data 
for the run was ordered and the five quantiles were observed. 
Thus, for each run, each of the five empirical quantiles had 
been observed and had an estimated mean and variance. Fin- 
ally a mean of the four overall observations for each quan- 
tile and a mean of the section means were computed. The 
results are shown in Table I. 

Lilliefors [1967] discussed the Kolmogorov-Smirnov test 
for normal data and calculated numerically the quantiles of 
the test statistic for the case where the mean and variance 
of the normal distribution must be estimated. These quan- 
tiles are included for comparison. 

The second of the two significant areas was the estima- 
tion of parameters. The subroutine ESTIM8 was written, in 
double precision, to utilize the method of scoring for pa- 
rameter estimation. As a result of the use of the subrou- 
tine several potential hazards to the proposed technique 
became visible. 

The first of these hazards was the disparity between 
magnitudes of the five unknown parameters. Three of these 
are means and variances and the other two are probabilities, 
which are always less than or equal to one. This problem 
became apparent when the magnitudes of the scores and the 
elements in the matrix of second-order partial derivatives 


were seen. An attempt to correct this problem was made by 
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Source 


Usual quantiles 
Lilliefors quantiles 
Run 1 | 
Mean 
Variance 
Run 2 
Mean 
“Variance 
Run 3 
Mean 
Variance 
Run 4 
Mean 
Variance 
Mean of Runs 
Variance of Runs 
Mean of Means 
Variance of Means 
Lilliefors normal 


quantiles 


TABLE I 


0.736 


0.15 


1.14 


0.005 


ico 
0.85 
0.01 
O2c0 


0.79 


0.002 


0.768 
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0.805 


Level of Significance 


0.05 


0.886 


12031 





dividing the three large parameters by the overall mean of 
the intervals. This also favorably affected the partial 
derivatives involving these parameters. The desired effect 
was achieved in that the gap between magnitudes was nar- 
rowed; however, the parameter estimates that resulted from 
this modification were only about one percent different from 
the parameters achieved earlier, sO apparently the disparity 
created no significant ssn. 

The second of these problems was that the final param- 
eter estimates, overall, appeared to have little relation- 
ship to the marginal parameter values from which the data 
was generated. Similarly, parameter estimates for two sets 
of ska CERES eoe greatly in magnitude and at times in sign, 
even when the Ber onOGren was accepted as a close fit to the 
power spectral density. Differences in sign were extremely 
disturbing since all of the parameters were expected to be 
greater than zero. 

A third problem, related to the second, was that the re- 
sults failed, numerically, to establish that the matrix of 
second-order partial derivatives was negative definite. 
Similarly the negative inverse of that matrix could not be 
shown to be positive definite. This problem indicated that 
either a maximum had not been achieved, even though a cut- 
off criterion of 107'° was used to test for convergence, or 
that due to round-off error the properties of a maximum 
could not be detected. With a smaller cut-off criterion the 


process would not converge and had to be terminated. 
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Finally, in a few instances when four of the five un- 
known parameters appeared close to the simulation parameters 
the initial value of the fifth parameter was changed and the 
subroutine was restarted. The parameters would again con- 
verge; however, the final values in some cases changed dras- 
tically, even to the point of changing sign. 

Some of these-problems may have been caused by an ill- 
conditioned system of equations, while others might be due 
to the lack of a powerful iterative technique for the solu- 
tion of a system of equations that has, perhaps, poor initial 
estimates. In any case it should be clear that the use of 
second-order properties of a model might simplify or at 
least aid the parameter estimation process. One proposed 
modification to the technique discussed in this thesis was 
to use a mixture of the method of moments approach on the 
Marginal distribution of the intervals and the maximum like- 
lihood approach on the second-order properties to estimate 
parameters. 

In conclusion it should be recalled that model fitting 
and parameter estimation for univariate point processes is 
almost a completely open field and that attempts, even un- 
successful ones, are needed in order to break-through the 


barrier of inadequate methodology. 
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